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Abstract — We consider the generalized differential entropy of 
normalized sums of independent and identically distributed (IID) 
continuous random variables. We prove that the Renyi entropy 
and Tsallis entropy of order a (a > 0) of the normalized sum of 
IID continuous random variables with bounded moments are con- 
vergent to the corresponding Renyi entropy and Tsallis entropy 
of the Gaussian limit, and obtain sharp rates of convergence. 
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I. Introduction 

THE Shannon entropy of a random variable X with density 
/ : 1Z — > [0, oo) is defined as 

H(X) = - [ /log/ 

provided that the integral make sense (TJ. (We use log to 
represent the natural logarithm throughout this paper). It is 
interesting to study the convergence of the normalized sums 

1 " 

S n — — p= / , Xi 
v i— l 

of independent copies X to the Gaussian limit: the central 
limit theorem for independent and identically distributed (IID) 
copies of X. Without loss of generality, we suppose K(X) = 
and E(X 2 ) = 1. The standard Gaussian distribution is denoted 
by G. 

The idea of tracking the central limit theorem using Shannon 
entropy goes back to Linnik 0, who used it to give a 
particular proof of the central limit theorem. Barron [3] was 
the first to prove a central limit theorem with convergence in 
the Shannon entropy sense. He proved that H(S n ) converges 
to H(G) = log V27re if H(S n ) is finite for some n > 0. 
Notice that H(S n ) is finite for some n > if for some s > 0, 
E\S n \ s < oo, since 

H(S n ) < -log[E\S n \ s (2T(-))es 1 - s }, 

s s 

Hongfei Cui is with Wuhan Institute of Physics and Mathematics, The Chi- 
nese Academy of Sciences, Wuhan 430071, China (e-mail: cui@wipm.ac.cn). 
Jianqiang Sun is with Wuhan Institute of Physics and Mathematics, The 
Chinese Academy of Sciences, Wuhan 43007 1 , China, and Graduate School of 
the Chinese Academy of Science (e-mail: sunjianqiang09@mails.gucas.ac.cn). 
Yiming Ding is with Wuhan Institute of Physics and Mathematics, The Chi- 
nese Academy of Sciences, Wuhan 430071, China (e-mail:ding@wipm. ac.cn). 
This work was partially supported by National Basic Research Program of 
China (973 Program) Grant No. 201 1CB707802. 



where T is the Gamma function. Artstein et al. [4| , Johnson 
and Barron |5| obtained the rate of convergence 

\H(S n )-H(G)\=0(h 
n 

provided the density of X satisfies some analytical conditions 
J6). Moreover, the conclusion that H(S n ) is increasing to 
H(G) was obtained by Artstein et al.Q, and some simpler 
proofs can be found in Tulino and Verdu [8] and Madiman 
and Barron |9|. 

Renyi entropy is a generalization of Shannon entropy iflOl . 
It is one of a family of functionals for quantifying the diversity, 
uncertainty or randomness of a system. The Renyi entropy of 
order a (a € TV) is defined as 

R a {X) = — — log / f a (x)dx, ol+\. 

By L.Hoptital's rule, R a (X) ~> H(X) as a 1. The Renyi 
entropy of order 2, R2(X), is called Collision entropy. 

The Renyi entropies are important in ecology and statistics 
as indices of diversity. Renyi entropies appear also in several 
important contexts such as information theory, statistical es- 
timation, and quantum entanglement (TTI . fl2l . |fl3l . A class 
of Renyi entropy estimators for multidimensional densities are 
given by Leonenko et al. fl4l . 

Another generalization of Shannon entropy is Tsallis en- 
tropy 1 15 1, which is defined as 

T a (X) = — ^-(1 - f f a (x)dx), a^l. 

It is easy to see that T a (X) — > H(X) as a — > 1. Historically, 
this family of entropies was derived by Havrda and Chaivat in 
1967 |16|. The Renyi entropy and Tsallis entropy with same 
order a are related by ifTTl 

R a {X) = -J— l g(l - (a - l)T a {X)), 
1 — a 

which is a one-to-one correspondence between R a (X) and 
T a (X). In fact, R a (X) is a monotone increasing function of 
T a {X), because 

dR a (X) = 1 = 1 

dT a (X) 1 - (a - l)T a (X) J n f a (x)dx 

provided J n f a (x)dx < oo. 

Parallel to the Shannon case, we consider the convergence 
of R a (S n ) and T a (S n ) (a > 0), and investigate the rates of 
convergence. 

Our main results are the following Theorem. 
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Main Theorem Let X\ , X2, ■ ■ ■ , X n be independent copies 
of a random variable X with characteristic function <p(t). 
Suppose the following hold: 

1) E(\X\ k ) is finite for k = 1,2,...; 

2) ^(i)) 11 is integrable for some v > 1. 
Then for any a > 0, we have 



For discrete random variables, such kind of continuity is also 
valid |20). 



lim R a (S n ) = R a (G), lim T a (S n ) = T a (G). 

n— too 



n— >oo 

Furthermore, 



\Ra{S n ) — Ra(G)\ 

\T a (S n ) - R a (G)\ 



1 < a < 00; 
0( n -§+7) 0<a<l,0<7<§; 

(3(n~2) 1 < a < 00; 
0(n-t+7) 0<a<l,0<7<|. 



Although the Renyi entropy and Tsallis entropy can be 
defined for any real number a, we only consider the case 
a > 0. In fact, for a < 0, one can check that R a {G) = 00 
and T a (G) = 00. 

The bounded moment condition 1) is equivalent to E,(X k ) < 
+00 for all positive integer fc, which is also equivalent to the 
fact that the characteristic function ip(t) admits derivatives of 
all orders at t = 0. It is a local condition imposed on (f(t), 
while the integrability condition 2) assumes a global property 
of ip(t). 

As we shall see in Lemma 5, the bounded moment condition 
1) implies the existence of R a (S n ) and T a (S n ) for all a > 0. 
We shall obtain the rates of convergence via Feller's expansion 
of densities. According to Feller's expansion IPT81 (Lemma 3), 
the density of S n is convergent to the density of G uniformly 
with rate o(n -1 / 2 ), where the integrability condition 2) is 
assumed. Hence, the rates of convergence we obtained in Main 
Theorem are sharp. 

Since the moment condition is weaker than the Poincare 
inequality condition which was used in the Shannon case 0, 
the rate of convergence for Shannon entropy we obtained is 
0( n -i/2+7) ( 7 > is small) rather than C^n" 1 ). 

If the moment condition 1) in Main Theorem is replaced by 
E\X\ 3 < 00, it is shown that R a (S n ) — > R a (G) as n -» +00 
for every__a > 1, and rough rates of convergence are obtained 
in HI 
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II. Convergence of Renyi entropy and Tsallis 
entropy 

Let {Y n } be a sequence of random variables with density 
functions {p n (x)} and Y be a random variable with density 
function p(x). It is interesting to ask whether the Renyi 
entropy and Tsallis entropy of {Y n } of order a (a > 0) 
are convergent the corresponding entropy of Y, provided 
Y n — > Y in some sense. The following Theorem 1 claims 
that if {p n (x)} is uniformly bounded, {p n (x)} is uniformly 
convergent to p(x), and the L Q -norm of {p n (x)} is uniform 
bounded for every a > 0, then the convergence results hold. 

'There is a small error in 1 19 |, in fact, the rates of convergence claimed in 
it should be divided by 2. 



Theorem 1 Let {Y n } be a sequence of random variables 
with density functions {p n (x)}, Y be a random variable with 
density function p(x), and A be a subset of 1Z with zero 
Lebesgue measure. Suppose the following hold: 

1) for any e > 0, there exists a positive integer N > 

such that sup {|p„(x) — p(x)\} < e for n > N; 
xen\A 

2) there exists a finite number M > such that p n {x) < M 
uniformly in x € 1Z\A and n G N; 

3) for every a > 0, there exists a finite number M a > 
such that 

/ p«{x)dx < M a 

uniformly in n£N. 
Then for a > 0, we have 

lim R a {Y n ) =R a (Y), 

n— >-\-oo 



lim T a (Y n ) = T a (Y). 

n— f +oo 



Remark 1: 

1) It is interesting to note that one may use Theorem 1 
to obtain the convergence results of Renyi entropy and 
Tsallis entropy for random variables with correlations. 

2) Theorem 1 claims the convergence of Renyi entropy 
and Tsallis entropy for all a > 0. One can assume 
weaker conditions on the densities of {X n } to ensure 
the convergence for those a belong to some bounded 
and closed subset of (0, oo). 

3) The condition 1) in Theorem 1 is equivalent to the fact 
that {p n (x)} converges to p(x) uniformly on 1Z\A as 
n — > +oo. Combining with condition 2) in Theorem 1, 
we know that p(x) < M, Moreover, in the following 
Lemma 1 we also obtain that for every a > 0, 

p a (x)dx < M a . 

■R 

4) Since we are only interested with the asymptotic be- 
havior of {p n (x)}, it is enough to require the uniform 
boundedness of the i"-norm (0 < a < oo) of {p n {x)} 
for n > N*, where is a positive integer. 

We prepare two Lemmas which are important in the proof 
of Theorem 1 . 

Lemma 1: Suppose that the conditions in Theorem 1 are 
satisfied. Then for every a > 0, we have 



lim 



\ P Z(x)-p a (x)\dx = 0. 



(1) 



R 



Proof: At first, we prove that for every a > 0, 
J n p a (x)dx < M a . Since for every a > the function 
fa(x) = x a is continuous and {p n (x)} converges to p(x) 
uniformly on TZ\A as n — > +oo, we have that {p"(x)} 
converges to p a (x) uniformly on 1Z\A as n — >• +00. By 
Fatou's Lemma and condition 3 in Theorem 1, we conclude 
that for every a > 0, 



p a (x)dx 



R 



/ liminf p"(x)dx < liminf / p"(x)dx < M a . 

n— >+oo n— >+oo 



3 



and < a < 1. 

For the case a > 1, using Lagrange mean value theorem, 
we have 



n 



\p a n {x)-p a {x)\dx = a \p n {x)-p{x)\&- l (x)dx 



n 



< a sup \p n (x)-p(x)\ / f£ ^xjdi 



c| Q , 


we have 






/ K( x ) 


-p a (x)\dx 


< 


/ |Pn(») 

Jn 


~p{x)\ a dx 


< 


\ \Pn(x) 

Jn 


-p(x)\ a -^\p n (x)+p(x)\^dx 



■R 



where mm{p n (x) , p(x)} < 1 (x) < max{p n (x) , p(x)} . 
Next, we'll prove that J n (x)dx is bounded. 



< sup \p n (x) - p(x)\ a - 2 ^ 
xen\A Jn 

< 2^ +1 M 2l sup \pn(x) - p(x)\ a -^ 

xen\A 



\p n (x) +p{x)\ 2l dx 



(4) 



^~ 1 (x)dx 



{x:p„(x)>p(x)} 



£%- l {x)dx 



< 



{x:p n (x)<p(x)} 

p*~ 1 (x)dx 



{x:p„(x)>p(x)} 



p a - 1 (x)dx 

{x:p n (x)<p(x)} 

< I p°- 1 (x)dx+ { p a ^(x)dx 



n 



n 



Combining inequality and the condition 1) in Theorem 
1, we obtain ([Q for < a < 1. ■ 

Lemma 2: Suppose that the conditions in Theorem 1 be 
satisfied. Then for any e > there exists a i5 > such that 

\T a (Y n )-H(Y n )\<e 

for all n G N and all a satisfying \a — 1| < S. 

Furthermore, for any e > there exists 6 > such that 

\T a (Y)-H(Y)\<s 

for all a satisfying \a — 1| < i5. 

Proof: The proof is given in Appendix. ■ 



Using the condition 3) in Theorem 1, we have 

f K £%~ 1 (x)dx < 2M a -\ < +oo for every a G (l,oo). 
Hence we obtain that 



It is time to give the proof of Theorem 1 . 

Proof: We consider two cases: a^l and a = 1. 
Suppose a ^ 1. By Lemma Q] for every a > there exist 

/ \p a n {x)-p a (x)\dx < 2aM a ^ sup \p n (x)-p(x)\. (2) N > such that In.Pn( x ) dx > UnP°( x ) dx for 71 > N - 
Jn xen\A Using the inequality log(l + x) < x (x > 0), we have 



Combining inequality (O and the condition 1) in Theorem 
1, we obtain (HJ for a > 1. 

Now we consider the case < a < 1. 

The condition 3) in Theorem 1 indicates that for every 7 G 
(0, j), there exists an M 2l > such that f n p 2 ? (x)dx < M 2l 
and J n p 2 ' r (x)dx < M 2l . It follows that 



\p n (x) +p{x)\ 2l dx 

(Pn(x) +p(x)) 2l dx 



\R a (Y n ) - R a (Y)\ 

1 -\log^PnW dX l 



n 



< max 



< max 



< 



|1 — a I J n p a (x)dx 

I K Pn( x ) -P a (x)dx J K p a (x) -p°{x)dx 
\l~a\J n p a {x)dx ' |1 - a\ j n p%{x)dx 

InPn( x ) ~P a {x)dx j n p a {x) -p%(x)dx 
|1 - a\J K p a (x)dx ' |1 - a\\ J n p a (x)dx 
2f lz \p a (x)-p%(x)\dx 



(5) 



{x:p„(x)>p(x)} 



(p n (x) +p(x)) 2j dx 



< 



< 



{x:p n {x)<p(x)} 

{2p n {x)) 2 ~< dx 
{x:p n (x)>p(x)} 

(2p(x)) 2 ~<dx 

{x:p n (x)<p{x)} 

2 2 ^p n {x) 2 ' 1 dx+ ( 2 2 >(x) 27 dx 



|1 — a\ J n p a (x)dx 

Combining <(3j and Lemma [T] we obtain that for every a > 
and a ^ 1, 

lim \R a (Y n )-R a (Y)\=Q. 

n— >oo 

It is obvious that 



7? 



\T a (Y n )-T a (Y)\ 
= I— ^— r(l - / P a n {x)dx)-^- 



n 



< 2 2 "> +1 M 2l < 



00. 



(3) 



1 



By inequality (01 and the trivial inequality \b a — c a \ < 



< 



'a-1 
1 



p a n (x)-p a {x)dx\ 
\p a n {x)-p a {x)\dx. 



(1- / p a {x)dx)\ 
n 



(6) 



n 
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We obtain that for every a > and a ^ 1, 

lim |r a (y„)-T a (y)| = o. 



(7) 



Now we consider the case a = 1. 

Remember that for any random variable X, iii(X) = 
Ti(X) = H(X). By triangular inequality, 

|if(y n )-#(y)| 
< |#(y n ) - T ao (y„)| + |r Q0 (y„) - r ao (y)| 

+\T ao (Y) - H(Y)\. (8) 

Given any s > 0, there exists Si > such that |if(y n ) — 
Ta (y n )| < §, for all n £ N and all points a satisfying \oq — 
1| < 5i by Lemma [2] Similarly, there exists S 2 > such that 
l^aoOO ™ #001 ^ § a11 points a satisfying \a - 1| < <5 2 . 

By (0, we know that for every qo 1 satisfying | ckq — 1 1 < 
min{<5i, fe}, there exists an iV > such that \T ao (Y n ) — 
T ao (Y)\ < | for all n > N. 

According to inequality dS), we conclude that for any e > 0, 
there exists an N such that \H(Y n ) - H(Y)\ < e for n > N. 

Therefore, H(Y n ) converges to H{Y). 



III. Proof of Main Theorem 
Lemma 3: (Expansions for densities, Feller [18|) Let 

X\, X%, . . . , X n be independent copies of a random variable 
X with characteristic function <p(t), f n be the density function 
of normalized sum S n , and g be the density of the standard 
Gaussian distribution G. Suppose that ¥,\X\ 3 — p < oo, and 
that \(p\ v is integrable for some v > 1. Then as n — > oo 

f n (x) - g(x) --?=(x 3 - 3x)g(x) = o(-L) (9) 

uniformly in x. 

Remark 2: Since g(x) is the density function of the 
standard Gaussian distribution G, g(x) < — h= < 1. For every 
a > 0, there exists an M' a > such that j n g a (x)dx < M' a . 
According to Lemma [3] we have 



sup{|/„(a;) - g(x)\} = o(n 



(10) 



Using ( [Tol l, we have that there exists an N > such that 
fn(x) < 1 uniformly for n > N and x G 7Z. Hence, without 
loss of generality, we can suppose that f n (x) < 1 uniformly 
in n £ N and x E 1Z. 

Next, we'll prove that f n (x) satisfy the condition 3) in 
Theorem 1. 

Lemma 4: (Theorem 2.10 ED ) Let Xi,X2,-.-,X n be 
independent random variables with zero means, and let k > 2 
and Z n = X\ + X 2 + . . . + X n . Then 

n 

E\Z n \ k =C(fc)nl- 1 ^E|X i | fe , 

where C (k) is a positive constant depending only on k. 

Lemma 5: Let Xi , X 2 , . . . , X n be independent copies of a 
random variable X with probability density f and character- 
istic function ip(t). f n (x) be the density of the normlized sum 



S n . E(X) = 0, and E(|X| fc ) = p k , where k = 1,2,... and 
Pk is finite. Then for each positive a, there exists an > 
such that 



Proof: We distinguish three cases: a = 1, a > 1 and 
< a < 1. 

The case a = 1 is obvious because {f n } are densities. If 
a > 1, noting that sup/„(a;) < 1 in Remark 2 we have 

fZ(x)dx= [ f%- l {x)f n {x)dx<l. 

R JR 

If < a < 1, one can choose a positive integer k such that 
< a. From Lemma @] we obtain that 



EISJ 



= E 



{X t +X 2 + ... + X n )/n* 
= nz n \ k /ni 

n 

< C(k)n^C£\ X i\ k )/' 



in? 



= C{k)E\X\ k 
■= Pk < °°- 

On the other hand, noting that sup/„(x) < 1 in Remark 2 

x£K,n£N 

we have that 



■R 



f£(x)dx 



\x\>l 



f%(x)dx+ f«Xx)\x\ ka \x\- ka dx 

J\x\>l 

< 2+(f f n (x)\x\ k dx) a ([ \x\^dx) X - a 

J\x\>\ J\x\>\ 

< 2 + pf{f \x\~^dxf-^ 

J\x\>l 

where the first inequality follows from Holder inequality. 

Since Jj a .i >1 \x\ _T:r ^dx is finite for < a < 1, one c 
find a positive constant M'^ independent of n such that 

fi(x)dx < Ml. 

■R 



Lemma 6: If < y, x < 1, then for every 7 £ (0, there 
exists a Q 7 > such that 

Q^x 1 ^ - y 1 -^ > |xV" 7 logx-yV-^lo g2 /|. 

Proof: The proof is given in Appendix. ■ 

Proof of Main Theorem. 

Proof: From Remark 2, we know that f n (x) satis- 
fies the condition 2) in Theorem 1, and f n (x),g(x) satisfy 
the condition 1) in Theorem 1. By Lemma [5] and letting 

M a = ma,x{M' a ,M^}, we obtain that for every a > 
J K f"(x)dx, J n g a (x)dx < M a uniformly in n £ N and 
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{fn(x)} satisfies the condition 3) in Theorem 1. Hence from 
Theorem 1 we obtain that 

lim R a (S n ) = R a (G), lim T a (S n ) = T a {G). 

n— >oo n—±o£> 

Next, we'll study the rates of convergence for R a (S n ) and 
T a (S n ). 

At first, we consider the case a > and a^l. 
Using the inequality log(l + x) < x (x > 0), and 
inequalities @, ©, ©, we have 



\Ra(S n ) — Ra{G)\ 

2j n \fZ(x)-g a (x)\dx 
\l-a\f n f%(x)dx 

■ 4aM a - lS u P \f n (x)-g(x)\ 



< 



a e (1, +oo); 



, J^p a (x)dx(a-1) 
^ M 2 ,,sup\f n (x)-g(x)\"- 2 -< 

-^^ { x )dx{l - a) a € (0,1), 7 £((),§). 



(ID 



Combining ( TTOb and ( fTTl i. we obtain 

0(n~"i) a € (1, +oo); 



|-Rq(S'„) — i? Q (G)| = 



0(n-f+^) ae (0,l), 7 e(0,f). 



On the other hand, using the inequalities ©, (|2), (0J, we 
have 

\T a (S n )-T a (G)\ 

= l^-r / /„ Q W-ff Q (x)dx| 
a - 1 ./ K 



< 



< 



1 



a - 1 J-K 

2aA/ a _isup|/„(:r)- S (:r)| 



a € (1, +oo); 



M2 7 sup \p n (x)— p(x)\ a ~ 2y 

° e ?-*-m- a) ae(o,i), 7 e(o,f). 

(12) 



Combining ( TTOb and (1121 1. we obtain 

'o(n~1?) a € (1, +oo); 



\T a (S n )-T a (G)\ 



0(n-^) ae (0,l), 7 e (0,|). 



Now we investigate the case a = 1. 
Observe that for any random variable X, 

R 1 (X)=T 1 (X)=H(X). 

In what follows we show the following 

\H(S n ) - H(G)\ = 0(n-^), /or 7 e (0, ±). 



(13) 



For 7 g (0, i), we have 

n^\H(S n )-H(G)\ 

= ni' 1 j f n (x)\ogf n (x)dx - / g(x)\ogg(x)dx 
Jn Jn 

= ™'~ 7 | / f n {x)\ogf n {x)~ f2.{x)g 1 -^{x)\ogf n {x)dx 
Jn 

+ f fZ{x)9 l -<{x)\ogf n (x)-f l -~<{x)9i{x)\ogg(x)dx 
Jn 



L 1 {x)g^ (x)\ogg(x) - g[x)\ogg{x)dx\ 



n 



< ^-^(1^(71)1 + | J 2 (n)| + |J 3 (n)|), 
where 



(14) 



Ji(n) 



f n (x) log f n (x) - f^{x)g 1 7 (a;) log f n (x)dx 



n 



J 2 (n)= / ^(^^(^log/^^-^^^logs^)^ 



/ /« 7 (a;)3 7 (^)log5(^) - 5(2:) logs^cfa:. 



By Lemma [3] for e =\, there exists an iVi > 0, such that 
for each n > Nx, we have 

n*\f„(x) -g(x)\ 

< max{||(* 3 - 3x)g(x) + \\, |£ (x 3 - 3a:) fl (x) - ~|} 

:= m(x) < max{m(x)} := Ci. 

xen 

Therefore, we have 

n^~ 7 Ji(n) 
= r^~ 7 | / /„(a;)log/„(x)c!x 

#(zy- 7 (aO log /„(*)<&! 

< n' -7 / mx)\f^(x)-g 1 -^(x)\\logf n (x)\dx 
Jn 

< n~i [ f^n^lUx) - gix^-^logf^ldx 
Jn 



< 



< c{ 



f2(x){m(x)) l -<\\ogf n {x)\dx 
n 

^ [ frhx)\tf (x)\0gf n (x)\dx. 

n 



By Lemma[5] there exists Mi > such that f R /„ 2 (x)dx < 

Mi. On the other hand, noting that f n (x) < 1 uniformly 

in n E N,x € 1Z in Remark 2, hence when n > 

N u n 7 - x / 2 Ji(n) < MiCi" 7 ^ < +oo, where C 2 = 
z 2 
JP a ^ r 0)log/n(x)|. 

Using similar arguments, we can obtain that the there exist 
N 2 , C 3 > such that 

n5"T J 3 (n) < C* 3 for n > iV 2 . 

According to Lemma [6] for every 7 € (0, |), there 
exists a Q 7 > such that Q 1 \fn~ 1 {x) — g 1_7 (a:)| > 
I P n (x)g 1 ^ (x) log /„ (x) - gf (x)ft~< (x) \ogg(x) \ . 
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It follows that 

n5 _7 J 2 (n) = r0~' 1 \ / f^(x)g 1 ~' y (x) log f n (x)da 
Jn 



fn 1 (x)g 1 {x)\og g{x)dx\ 



R 



< ni->Q, ( / \f^(x)-g 1 ^(x)\dx. 
Jn 

Combining inequalities (O, (0]i and equality dlOl l. we have 

\f^(x)-g 1 -' 1 (x)\dx = 0(n-^). 



■R 



Hence, there exists a C4 > such that 1 J2{n) < C4 for 
n > N 3 . 

From the above discussion and inequality ( TT4l >. there exists 
constant C := M1C1C2 + C 3 + C 4 such that 



fn{x)\ogf n {x)dx - I g{x)\ogg{x)dx 
n Jn 



for n > max{iVi, N2, N 3 }. 
Hence, ( fT3l ) is true. 



IV. CONCLUSION AND DISCUSSION 

We show the convergence of the normalized sum of IID con- 
tinuous random variables with bounded moments of all order 
in the sense of Renyi entropy and Tsallis entropy, and obtain 
sharp rates of convergence. By using Feller's expansion and 
detailed analytical properties of the corresponding densities, 
we estimate the Renyi entropy and Tsallis entropy directly. 
The main difficulty lies in the case of Shannon entropy, both 
on the convergence and rate of convergence, because a = 1 
is a singularity of R a and T a . We circumvent it by obtaining 
some uniform estimations near a — I. Compared with the 
previous proof for Shannon entropy, our proof is more direct, 
can be generalized to random vectors in higher dimension, and 
may be used to consider the convergence of normalized sum of 
dependent random variables. The bounded moment condition 
we used is weaker than the Poincare constant condition in 
Shannon case [6|. As a result, the rates of convergence is 
slower than O(^) in Shannon case. It is interesting to consider 
the convergence and rates of convergence for Renyi divergence 
as Renyi raised in fTOll . 

V. Appendix 

Proof of Lemma |2j 

Proof: Observing that 

- ! n Pn{x)d x) _ ! , 
dt 



= - I P n (x)\0gp n (x)dx 

n 



and (l — J n p t n (x)dx) \t=i = 0, we have that 

1 - J Pn(x)dx = - I I p t r Xx)\ogp n {x)dxdt. (15) 



1 Jn 



From equality < fT3T > we have that 



T a (Y n )-H(Y n ) 
1_ InPn( x ) dx 



a — 1 



p n {x)\ogp n {x)dx 



R 



a J K p n (x)\ogp n (x)dx - J n p t n {x)\ogp n {x)dx 



a- 1 



dt 



a Jn(t) 
1 («-l) 



dt, 



where 

Jn{t) 

It follows that 

\R a (Y n )-H(Y n )\ < 



p„(x)\ogp n {x)dx - / p t n (x)\ogp n {x)dx. 
n Jn 



dt. (16) 



1-a 



L |1 - f| 


Jn(t) 




1 -t 



Next, we'll prove | J n (t)/(1 — t)\ is bounded uniformly in 
t e [f , |] and neN. 

Using Lagrange mean value Theorem, for fixed n and x, 
there exists £, n {x) which is between t and 1 and in [|, |] such 
thatp* t (x) -p„{x) =p|' l(x) (x) log p n (x)(t- 1). We have 

\Jn(t)/(l-t)\ 
J n (Pn(x) - p t n {x))\0gp n {x)dx 



1 - t 

p^ {x) {x)\og 2 p n {x)dx\ 



R 



< 



< 



< 



p n (x)<l 



\pi^ x \x)\og 2 p n {x)\dx 



p n {x)>l 



\p^ x \x)\og 2 p n {x)\dx 



p n (x)<l 



\pk{x)\0g 2 p n (x)\dx 



p n {x)>l 



\p'n{x)\0g 2 p n (x)\dx 



R 



Pn (x)\pn (x)\0g 2 p n (x)\dx 

- 2 

Pn(x)\pn (x)l0g p n (x)\dx 



where 



R 

< B{[ pl(x)dx + l), 
Jn 



B = sup \p%(x)\og p n (x)\ < 00 

x£n\A,n€Fl 



by condition 2) of Theorem 1. From condition 3) in Theorem 
1, we know that f n p n (x)*dx < Mi. Letting L = B(l + 
Mi), we obtain that \ J n (t)/{\ —t)\<L for neN. 
Combining with inequality ( fT6l ), we have that 



\R a (Y n ) - H(Y n )\ < 



1 



< 



\l-a\ 
1 

L\l-a 



t-1 



Jn{t) 



1 -t 



dt 



/a 
L\t- l\dt 
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Therefore, for any s > 0, there exists a 5 — min{j-, i}, 
such that \T a (Y n ) - H(Y n ))\ < e uniformly for all n e N and 
all points a satisfying \a — 1] < S and | < a < §. 

The conclusion 2) can be obtained by similar arguments. ■ 

Proof of Lemma |6j 

Proof: We just prove the case: < y < X < 1, the proof 
of the case 0<x<y<lis similar. 

Suppose that <i 7 (x) = x 1 — (1 — 27)x 7 logx. Since <i 7 (x) 
is continuous in x € [0,1], for every 7 G (0,^), <i 7 (x) is 
bounded in x g [0, 1]. Thus, for every 7 g (0, i), there exists 
a Q 7 > such that |d 7 (x)| < 

For Q < y < x < 1, we have a; 1-7 > y 1 1 and 

x 7 y 1-7 log x—y^x 1 ^ 1 logy > — a; 7 ?/ 7 (a; 1 ~ 27 — y 1-27 ) logx > 

It follows that Q^x 1 " 7 - y 1 " 7 ! = Q 7 (x 1 ~ 7 - y 1 ^ 7 ) and 
|a; 7 y 1-7 i g <£_ yT^i-7 i g y| — x 7 y 1 ' logx— y~ 1 x l ~ 1 log y. 
Denote 

AyO) := Q^x^-y 1 " 7 )- (xV^logx-yV^lo; 
= x^y^iQ^-'-Q^- 1 

—x 2l ~ l logx + y 27_1 logy) 
:= x^'y^'F^x), 

where 

F 7 (x) := Q 7 y 7 " 1 - Q^ 7 " 1 - x 27 " 1 logx + y 27 " 1 logy. 

In what follows, we show that A 7 (x) > for < y < x < 
1, which implies the Lemma is true for Q < y < x < 1. 

Obviously, if y = 0, A 1 (x) = Q 7 x 1 " 7 > 0. If y > 0, since 
F 7 (y) = and 

F 7 (a;) = (1 - 7)Q 7 ^ 2 + (1 - 2 7 )a; 27 - 2 logx - a; 27 " 2 
= x 7 - 2 {(l - 7 )Q 7 - [x 7 - (1 - 2 7 )a; 7 logx]} 
= x 7 - 2 {(l- 7 )Q 7 -d 7 (x)]} 

we obtain for every 7 £ (0, i) and y > 0, F y (x) > when 
x > y. As a result, A 7 (x) > 0. ■ 
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